Skip to content

ctran: enable per-block flag for broadcast with tcpdm/unpack#2079

Closed
fomichev wants to merge 2 commits intometa-pytorch:mainfrom
fomichev:export-D99452874
Closed

ctran: enable per-block flag for broadcast with tcpdm/unpack#2079
fomichev wants to merge 2 commits intometa-pytorch:mainfrom
fomichev:export-D99452874

Conversation

@fomichev
Copy link
Copy Markdown

Summary:
The TCPDM broadcast kernel (ncclKernelBroadcast<UNPACK=true>) hangs
with multiple GPUs because all 8 CUDA thread blocks share a single
kernel flag (flag[0]), creating a race condition on termination.

Send/recv kernels don't have this problem because they use per-block
flags (flag[blockIdx.x]): each block signals and terminates on its
own flag slot, so block 0 clearing its slot doesn't affect other blocks.

Relevant for TCPDM only because we have more than one block (due to unpack).

Reviewed By: function47

Differential Revision: D99452874

Stanislav Fomichev and others added 2 commits April 14, 2026 08:22
Summary:
I know this is not something that's gonna be supported for long or even used in prod, but it's working with a minimal code changes, so re-enabled v2.29 for conda with iter builds.

901 + ncclx + tcpdm, conda package: 0f473b3

Differential Revision: D100339239
Summary:
The TCPDM broadcast kernel (ncclKernelBroadcast<UNPACK=true>) hangs
with multiple GPUs because all 8 CUDA thread blocks share a single
kernel flag (flag[0]), creating a race condition on termination.

Send/recv kernels don't have this problem because they use per-block
flags (flag[blockIdx.x]): each block signals and terminates on its
own flag slot, so block 0 clearing its slot doesn't affect other blocks.

Relevant for TCPDM only because we have more than one block (due to unpack).

Reviewed By: function47

Differential Revision: D99452874
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 15, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 15, 2026

@fomichev has exported this pull request. If you are a Meta employee, you can view the originating Diff in D99452874.

@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 16, 2026

This pull request has been merged in a5eae6f.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant